AITopics | finite-time guarantee

We thank all the reviewers for their encouraging comments

Neural Information Processing SystemsFeb-14-2026, 18:11:42 GMT

We thank all the reviewers for their encouraging comments. In both these cases, τ is effectively zero. Liu et al. shows how GTD-class algorithms can be formally derived using a primal-dual saddle point Sutton et al. presents a (single time-scale) variant of linear TD learning, which they call emphatic TD and show that They also provide an asymptotic convergence analysis to the set of local optima. If the paper is accepted, we will work further on improving the clarity of the work.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Neural Information Processing SystemsFeb-8-2026, 12:46:35 GMT

Forouranalysis wedevise several concentration results forMarkovchains, including amaximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
Information Technology > Data Science > Data Mining > Big Data (0.31)

Add feedback

Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsDec-24-2025, 17:50:43 GMT

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems.

finite-time guarantee, name change, preference-based reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Add feedback

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsDec-24-2025, 02:42:21 GMT

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherent nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL \cite{reddy2019sqil} and IQ-Learn \cite{garg2021iq} emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks.

finite-time guarantee, maximum-likelihood inverse reinforcement learning, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

e354fd90b2d5c777bfec87a352a18976-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 06:52:12 GMT

algorithm, theoretical result, time-scale algorithm, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsFeb-12-2025, 00:02:01 GMT

This paper generated considerable discussion among the reviewers. One the positive side, this paper makes a solid contribution to the emerging literature on preference-based RL, a topic of some importance and makes some interesting insights (e.g., on the potential lack of a "winning policy") and novel algorithmic contributions. Conversely, some reviewers raised issues with some of the assumptions made in the paper and the presentation (which seems to assume familiarity with PBRL and its motivations/rationale. The author response was thoughtful and generated some discussion (some of which is not reflected in the reviews, a couple of which failed to get updated unfortunately). On my own reading if the paper, I agree that the paper makes a useful contribution to PBRL, especially from a technical perspective and conceptual perspective (although I don't believe it makes PBRL more practical at this stage).

contribution, finite-time guarantee, preference-based reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsFeb-6-2025, 23:45:20 GMT

Weaknesses: There are two main weaknesses. First, I'm not sure whether the algorithm is meant to be the core contribution, or the analysis. If it's the algorithm, then the paper needs to actually test the algorithm in more than toy settings (and ideally with real humans, rather than simulating answers with BLT with two parameter settings). But if it's the analysis, I almost feel like the experiments are distracting, or at least overstating and drawing away from the main contributions. I'd love to hear the authors' perspective on this, but my suggestion would be to either a) get the best of both worlds by running a more serious experiment, or b) edit the paper to highlight the analysis and justify the experiments as showing what the algorithm does empirically and perhaps aiding with some qualitative analysis of the resulting behavior when applied to simple tasks, aiding in the understanding of the algorithm.

algorithm, experiment, preference-based reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsOct-11-2024, 12:41:56 GMT

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy \varepsilon with high probability.

finite-time guarantee, preference-based reinforcement learning, reward value, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsOct-10-2024, 19:54:21 GMT

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherent nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL \cite{reddy2019sqil} and IQ-Learn \cite{garg2021iq} emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks.

finite-time guarantee, maximum-likelihood inverse reinforcement learning, optimal policy, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

Neural Information Processing SystemsApr-6-2023, 14:33:01 GMT

Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete com- binatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated an- nealing which allows one to guarantee finite-time performance in the optimiza- tion of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.

artificial intelligence, continuous domain, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Add feedback

Filters

Collaborating Authors

finite-time guarantee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

We thank all the reviewers for their encouraging comments

Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Preference-based Reinforcement Learning with Finite-Time Guarantees

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

e354fd90b2d5c777bfec87a352a18976-AuthorFeedback.pdf

Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning with Finite-Time Guarantees

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains